Integrating human knowledge within a hybrid clustering-classification scheme for detecting patterns within large movement data sets
نویسندگان
چکیده
The visual analysis of large movement data sets can be a challenging task. This study proposes an approach for identifying interesting movement patterns that combines human knowledge and decision making with a hybrid clustering-classification method. Rather than performing an unsupervised clustering of the entire data set, a stratified random sample of the full data set is used to identify initial clusters that are verified and labelled by the analyst, and then used as input patterns for classifying the remainder of the data set using an iterative genetic program. Classifications suggested after each iteration are presented to the analyst for refinement based on their knowledge and experience. A geovisual analytics environment is provided to both show the outcomes of the clustering and classification, and to obtain the analyst’s input during the hybrid clustering-classification process. Our approach allows data to be classified without a priori specification of classification patterns. Instead, the process takes advantage of human decision making within the automatic analysis of the data. The approach was tested with fishing vessel movement data in Eastern Canada.
منابع مشابه
A novel local search method for microaggregation
In this paper, we propose an effective microaggregation algorithm to produce a more useful protected data for publishing. Microaggregation is mapped to a clustering problem with known minimum and maximum group size constraints. In this scheme, the goal is to cluster n records into groups of at least k and at most 2k_1 records, such that the sum of the within-group squ...
متن کاملKnowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...
متن کاملKnowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services
The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...
متن کاملOil Reservoirs Classification Using Fuzzy Clustering (RESEARCH NOTE)
Enhanced Oil Recovery (EOR) is a well-known method to increase oil production from oil reservoirs. Applying EOR to a new reservoir is a costly and time consuming process. Incorporating available knowledge of oil reservoirs in the EOR process eliminates these costs and saves operational time and work. This work presents a universal method to apply EOR to reservoirs based on the available data by...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کامل